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SYNTHETIC INSECTICIOAL CRYSTAL PROTEIN GENE 



z i£'.Z 2? THE INVENTION 



20 



-us .nventicn -elates to tre field of bac.enal molecular biology and. in oaracuiar ~> genetic engineering 
=y ecomcnant technolcgy for the curoose of protecting pian:s from .nsect pests. O.scioseo Rerem are the 
--=»rncai syrtness of a modified crystal orotem gene from 3acillus thurin gi ensis var. tenebnon.s .BR), and 
-tie selective expression of this synthetic ir.sect.cida. gene. Also disclosed is the transfer or :ne cloned 
synthetic cene into a tioat m.croorgan.sm. rendering the organism capable of oroducing, at .mcnvsd levels 
:f exoression. a protein having toxicity to insects. This invention facilitates the genetic engineering of 
=actena ard plants to attain desired expression leveis of novel toxins having agronomic value. 

3ACXGHCUN0 OF THE INVENTION 

3 thuringiensis (Et) is unique * .ts ability to produce, dunng the process cf spoliation, protemaceous. 
--vstaliine inclusions which are found to Ce highly toxic to several insect pests of agricultural importance. 
The crystal proteins of different 8t strains have a rather narrow host range and hence are used 
-□mmtr;ally as very selective biological insecfcdes. Numerous strains of Bt are toxic to lepicopteran and 
jipteran insects. Recently two subspecies (or varieties) cf 8t have been reported to be pathogenic to 
-oieooteran insects: var. t enebrionis (Krieg et al. (1983) Z. Angew. Entomol. 96:500-508) and var. san diego 
"Hermstadt et al. (1986) Biotechnoi. 4 305- 308)" Both strains produce flat, rectangular crystal mcus.ons and 
nave a rraicT crystal component of 34-68 kOa (Hermstadt et ai. supra: Bemhard (1986) FEMS Microbiol. 

Lett. 33:251-265). , , . „, 

Twin genes from several subspecies of Bt have been cloned and the recombinant clones were found to 
=e toxic to leoidopteran and cipteran insect Irvae. The two coleopteran-active toxin genes have also been 
2S soiated and expressed. Hermstadt et al. sucra clones a 5.8 kb BarnHI fragment of Bt var. san diego ONA. 
The protein expressed in E. coii wasToxiTMP. lunoM (BM leaf beetle) and had a molecu.ar weight of 
approximately 33 kOa. This~83T0a toxin product from the var. san diego gene was larger than the 64 kOa 
-rystai protein isolated from Bt var. san diego ceils, suggesting that the Bt var. san diego cr/stal protein 
mav be synthesized as a large? precu7io"r"SSiecuie that >s orocessed by Bt var. san diego out not by E. coB 

30 sncr to oemg formed into a cr/stal. . .„„„-- ,. . 

Sekar et al. (1987) Proc. Nat Acad. Sci. USA 84:7036-7040: U.S. Patent Application 103.285. filed 
October 13~987 isolated the crystal protein gene from Btt and determined the nucleotide sequence. Tins 
:rvstai protein gene was contained on a 5.9 kb BamHI fragment (pNSBF544). A subclone containing the 3 
ko Hinom fragment from pNSBF544 was constructed" This Hindlll fragment contains an open readingframe 
(ORrTthat encooes a 644-amino acid polypeptide of approximately 73 kOa. Extracts of both subclones 
exhibited toxicity to larvae of Colorado potato beetle ( Leptinotarsa decemiineata . a coteooteran insect). 73- 
and 65-kOa peptides that cross-reacted with an antiserum against the crystal protein of var tenebnonis 
were produced on expression in E. coli. Sporulating var tenebrionis cells contain an immunoreactive 73-kOa 
pectice that corresponds to the e^ted product from the ORF of pNSBP544. However, isolated crystals 
onmanly contain a 65-kOa component. When the crystal protein gene was shortened at the ^«fyninal 
region, ihe dominant protein product obtained was the 65-kOa peptide. A de.etion derivative. p£44Pst-MetS. 
-as eraymatJcally derived from the 5.9 kb BamHI fragment upon removal of forty-six amino and residues 
;rcm the N-terminus. Expression of the Nominal deletion derivative. pS44Pst-MetS. resulted jn the 
production of. almost exclusively, the 65 kDa prote.n. Recently. McPherson et at. (1988) B*tecnnoiogy 
6-81-66 cemonstrated that the Btt gene contains two functional translationai initiation codons m the same 
Tead.ro frame leading to the orcdUction of both tne full-length protein and an N-term.nal truncated fornv 

Chimeric toxin genes from several strains of Bt have been expressed in plants. Four -n*** H£ 
geres frcn var. beniner 1715. under the control of me 2 promoter of the Aqrobactenum TR-ONA. were 
transferred into tobacco" plants (Vaeck et al. (1987) Nature 328:33-37). insecticical levels or toxm were 
produstd wiitn truncated genes were eTcrwsed in transgenic plants. However, the steady s^ate mRNA 
levels m re transgenic plants were so low that they couid net be reliably detected in Northern olot analysis 
and nence were quantified using ribonuclease protection expenments. Bt mRNA levels in plants producing 
me highest level of protein corresponded to -0.0001% of the POly(A) mRNA. In the report by Vaeck et at. 
,.98* suora. exoression of chimenc genes eenaming the entire coding seccence of 8t2 were compared to 
•nose coining truncated Bt2 genes. Additionally, some T-CNA constructs inexded a cumenc NPTII gene 
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as a ™<e< selectable .n olams. wnereas o.rer con 5 :ruc:s carried translation*. f u s.cns cetweer. rrscmems 
" 3.2 Tc he "pIS c-ne. mseocda. .eve s of »«.n -«ere produced -hen truncated 3£gene S ,r :us,on 
o. 3t2 anc .ne nkih 5 . e _ Qre-nr.ousa grown slants croducec -0.02% of the total 

so „o.e =ro:e,n as me ;o«n. or 3ug of tox, .per g . ins9c:icical ac:ivty ca* be 

sno «ec ,00% ^'^^^SSTSpi^ ic. that L same promoter «as usee to *rect 

2- ^ V,.d S ,e transgemc p*. -eaves were ,0 - 50 ,.r,es «~ 

^rz^^&y^"™"^ - ressicn - a a prot3,n ,n a svstsm 

Barton et aJ. (1987> p,am J™' 2 nce the Bt H r>, ♦.S kbgene.encoo.nga54Sam.no 

"oress,cn s,. represents a ,ow ,eve, * ^™ S^VS. Btt protem fused 

Various hybnd ^T'^ne a^^^ P™" s 

:o NPTII -ere proceed . « S. g * ^ ^23^?^^^ no, conta-ning *,s m.n.mum N- 

rSSm^rnoll 

structure of die Bt2 polypeptide is cisturbed .n truncated P»WJ« 

A number oTTesearchers have attempted to express plan! J B coli 

55:303-317; Rothstein fi a.. (1987) Gene ^ W ( ST!^^2Lnby« 

? £,22 S-^^S iSS-^S e, I. supra). — » 

(1987) Eur. J. Biocnem. loo-"' ^w- in uw w<w * A¥ -z Tiqac* sucrai in veast ow 

S (1^87) supra, «~ -^^T^, £j£ ^oi fe^pre/sion o, . 
levels of expression have been reported. Nail et al. <™ s Q Tyr 

g.iadin in yeast may be due in par, to ^^^^ Shas. In E. co.i however. 

B.ophys. Acta 527, 25-132 report mat ^ 3^53 S *L- 

g ,utamine. anc ^an.ne >s ^^'J^^.rna CopLon of specific planttissues 

35 acids than are maize embryo tRNAo. inis may .non^w . h _ jn To our knowledge, no 

may be adapted for optimum transition of highly oossible Sett of 

one has experimentally altered codon bias in highly expr eased p Ian, gene, to determne 
me protem translation in p.ants to chec* me effec* on the level of express.on. 

SUMMARY OF THE INVENTION 

, is the over,! cbiect of me present ^^'^iSS 2tT£S 

damage. The invention disclosed herem compnses a ^^ ra ^™ L %J 3 synmetjc g e ne is designed 
« protein which is .unctionally eauivalent . a native ^g.^ ETthe synthetic gene be 

cissrr ifjzzzsx §r r rU ,ha synmetic " ne ,s at ,east 

M prote-n from Btt having, for examp.e. the ^^J^^J^S!^ ^equtaience. 

'nucleotide, . mTough 1793 or spanning nucleotide^ ' «25 nB SClS^ Snlt. me ONA sequence 
in designmg syr.metic Btt genes of th.s .nventwn for ennanced I express^ o w 9xpre ssed plan, 

of me nativl bI structural i?ne .. modified in order to< ^tttat founl in p^ts. and also 
genes, to anaHTan A + T content in nucleotide base - J^^S^ ma, cause destabi.ization. 

53 preferably to form a plant -nidation sequence STS-d seouence, that constitute 

inappropriate po.yaceny.ation. degradation and «ro«»on of RNA and to a ^ ^ ^ ^ 

secondary structure hairpin, and PNA sphce s.te,^ In <h. synthetic^ genes^ ^ ^ 

ammo acid are selected with regard to me c.stnbution frequency 
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~* cressed ^lant genes to spedy that amnio acd. As is acpreciatec by :hose skilled in *.he art. the 
:is:r cuticn Tecuercy of ccdon jsage uciteeo >n tne synthetic gene s a determinant :f the level of 
T*cr9ss;cn. . u ence. the synthetic gene s designee sucn trai .ts sistnfcution frequency cf codon usage 
-evsares. preferably, no nore than 25% f-'cm that of hgmy expressed plant genes and. more creferafaly. no 

5 rr.ire t-^an aoout 10%. In addition, consideration ts given to the percentage G-C content or the degenerate 
:rvr- 3ase /noncccty;edons appear to favcr G-C m this positron, wnereas aicotyiecons cc rot), it is also 
:*jcgnued that me XCG nucleotide s the least preferred codon in dicots whereas the XTA codon is 
avoided «n ooth rnonocots and diccts. The synthetic genes of this invention also preferably nave CG and TA 
doublet avctdance iiicices as defined in the Detailed Description closely acoroximating those of the chosen 

?o host plant. More preferably these indices ceviate from that of the host by no more than aoout 10- 15%. 

Assembly of the St gene of wis invention is performed using standard technology known to the art. The 
Btt structural gene designed for enhanced expression in plants of the specrfic ambodiment is enzymaticaily 
assembled within a DNA vector from cnemicaily synthesized oligonucleotide duplex segments. The 
synthetic Bt gene is then introduced into a piant host cell and expressed by means known to the art. The 

'5 insectiedaTprotein produced uoon expression of the synthetic Bt gene in olants is functionally eauivalent to 
a native Bt crystal protein m having toxicity to the same insects. 

BRIEF DESCRIPTION OF THE FiGURES 

20 

^gure i presents the nucleotide secuence for the synthetic 8tt gene. Where different, the native 
seauence as found in p544Pst-MetS is shewn above. Changes m amino acids (underlined) occur in the 
synthetic sequence with alanine replacing threonine at residue 2 and leucine replacing the stop at residue 
596 followed by the addition of i3-amino acids at the C-termmus. 

25 Figure 2 represents a simplified scneme used in the construction of the synthetic Btt gene. Segments 

A through M represent oligc~ucieotide pieces annealed and iigated together to form DNA duplexes having 
unique splice sites to allow specific enzymatic assembly of the DNA segments to give the desired gene. 

~gure 3 is a schematic diagram snowing the assembly of oligonucleotide segments in the construc- 
tion of a synthetic Btt gene. Each segment (A through M) is built from oligonucleotides of different sizes, 

jo annealed and ligatedlb form the desired DNA segment 

DETAILED DESCRIPTION OF THE INVENTION 

The following definitions are provided in order to provide clarity as to the intent or scope of their usage 

35 m the Specification and claims. 

Expression refers to the transcription and translation of a structural gene to yield the encoded protein. 
The synmetic 3t genes of the present invention are designed to be expressed at a higher (eve* in plants 
than the corresponding native Bt genes. As will be appreciated by those skilled in the art. structural gene 
expression levels are affected by the regulatory ONA sequences (promoter, polyadenytation sites, enhan- 

40 cers. etc.) employed and by the host call in which the structural gene is expressed. Comparisons of 
synthetic 8t gene expression and native Bt gene expression must be made employing analogous regulatory 
sequences and in the same host ceil, it will also be apparent that analogous means of assessing gene 
expression must be employed in such comparisons. 

Promoter refers to the nucleotide sequences at the 5' end of a structural gene which direct the initiation 

*s cf transcription. Promoter sequences are necessary, but not always sufficient to drive the expression of a 
cewnstream gene. In prokaryotes. the promoter drives transcription by providing binding sites to RNA 
polymerases and other initiation and activation factors. Usually promoters drive transcription preferentially in 
the downstream direction, although promotional activity can be demonstrated (at a reduced level of 
expression) when the gene is placed upstream of the promoter. The level of transcription is regulated by 

so promoter sequences. Thus, in me construction of heterologous promoter/structural gene combinations, the 
structural gene is olaced under the regulatory control of a promoter such that the expression of the gene is 
controlled fcy promoter sequences. The promoter is positioned preferentially upstream to the structural gene 
and at a distance Torn the transcription start site that approximates the distance between the promoter and 
the gene t comrois in its natural setting. As is known in the art, some variation in this distance can be 

55 tolerated without :oss of promoter function. 

A gene refers to the entire ONA portion involved in the synthesis of a protein. A gene embodies the 
structural or ceding portion which begins at :he 5' end from the transtationaJ start cocon (usually ATG) and 
extends to the stop (TAG. TGA or TAA) codon at the 3' end. it also contains a promoter region, usually 
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ccated 5 or upstream ;o t.-.e structural gene, wnch initiates ana recuiates tne egression ;l a structural 
-er.e Ai«o mciuded in a ;ene are Sse 3 ?nc and soiy(A) + acdinon seauencss. 

3 Structural cene is ma: pcr-ion of a gene comprising a ONA secmsnt encoding a orotein. po./peotide or 
a - or.on tn^r eoTind exducing :r-e 5' secuence which drives the initiation of transcrip-jon. The structural 
. ^ ra '-, ay be 0fl9 wncn is normally found n tne ceil or one wh.cr. is not ncrmaily tourc in the cellular 
. 3 cc ,j., r , vhere .n it is mtrccuced. :n -rich case it s termed a heterologous gene . A heterologous gene may 
-e^e-va" m whole cr in oan from any source kno« to re an. including a bacterial gerome v episome. 
•ukawHfc nuclear or 3lasmid ON A, cCNA. viral ONA or chemically synthesized ONA. A structural gene 
may c-nta.n one or more modifications m either the coding or tne untranslated regions which could affect 
•o (he tioiogical activity or the c.nemical structure of the expression product, tne -ate of expression or the 
marner cf expression control. Such modifications include, but are not limited to. mutations, .nsertions. 
deletions ana substitutions of one or more nucleotides. The structural gene may constitute an un.nterrupteo 
coding sequence or it may inctuce cne or more intrant, boundec by the appropriate sp»ce .unctions. The 
structural gene may So a comccsite of segments derived from a plurality of sources, naturally occurring or 
•s synthetic. The structural ger.e may also encode a fusion protein. 

Synthetic cene refers to a CNA sequence of a structural gene that is chemically synthesized .n .ts 
enti rety or for IhTgreater part of me coding region. As exemplified herein, oligonucieotice ou.lcing blocks 
are synthesized using procedures known to those skilled in the art and are ligatec anc annealed to form 
cene" segments which are then enzymatically assembled to construct the entire gene. As is recognized by 
20 fho«e skilled in the art. functionally and structurally equivalent genes to the synthetic genes described 
herein may be orepared ty site-specific mutagenesis cr other related methods usee m the art 

Transforming refers to stably introducing a ONA segment carrying a functional gene into an organism 
that aid not previously contain that gene. . 

Plant tissue includes differentiated and undifferentiated tissues of plants, including but not bmrtad to. 
rcotTsncolTiSaves. pollen, seecs. tumor tissue and various forms of cells in culture, such as single cells, 
protoplasts, embryos and callus tissue. The plant tissue may be in plana or in organ, sssue or cell culture. 
Plant cell as used herein includes plant cells in plana and plant cells and protoplasts ,n culture. 
PSnTctogV refers to identity or near identity oTnucteotide or amino acid sequences. As is understood m 
me aTT^cTTotide mismatches can occur at the third or wobble base in the codon without causing amino 
acid substitutions in .he final polypeptide sequence. Also, minor nucleotide ^^^^^Z 
.nsemone or deletions) in certain reg,ons of .he gene sequence can be derated and 
insignificant whenever such mooifications result in changes ,n ammo ac* sequence 
(uncconality of the final product. It has been shown that chemically synthesized cop.es of whole, or parte of. 
gene sequences can repfcee the corresponding regions in the natural gene without toss of -gmm function 
Homoiogs of specific DNA sequences may be identified by those skiHed ,n me art usmg the test o aoss- 
hvbridStion of nucleic acids under conditions of stringency as is well understood m the art (as desenbed m 
H^T^ H^ U) <198S) Nucjeic Acid Hybridization . IRL Press. Oxford. UK). Extent of homo.ogy , 
often measured in terms of percentage of identity oetween the sequences compared. 

Functionally equivalent refer, to identity or near identity of function A synthetic f^^^* 
toxi c tt« least same insect species as a natural Bt protein is 

mereto. As exemplified herein, both natural and synthetic Btt genes encode 65 ^ £££ 
having essentially identical amino acid sequences and having toxicity to coleooteran msec* . Th. syn*e«c 
St gVnes of the present invention are not considered to be functionally equwalent to nanve Bt genes, since 
mev are expressible at a higher level in plants than native Bt genes. 

%Teq£ncy of preferred coden usage refers to the preference exhibited by » specific ho* cel. n usage 
of ruc^drcodons to specify a g»Z amino acid. To determine the frequency of usage of a particular 
dcnTa genHe number i occurrence, of .hat codon * the gene is rtvided by mejota l njmjjr^ 
occurrences of all codons specifying the same amino acid in (he s»"«- Table I. for example, gives we 
?recTnS of codon uTag. fo^St gene,, which wa, obtained by analysis of four Bt gene, whose sequences 
arHubTc.y liable. ImiiariyT me frequency of preferred codon usage exhibited by a ho t cell can be 
calculated by averaging frequency of preferred codon usage in a large number of genes jessed by ** 
no* cel.. It is preferable that this analysis be iimited to gene, that are hiqhly expressed by the no* ce^L 

Table .. for example, gives the frequency of codon usage by highly ^ res ^ J^ese xhifrtea ^ 
dicotyledonous plants, and monocotyledonou, plant,. The dicot codon usage ^ "?J*^^*on 
h, S hiy exoressed coding sequences obtained from Genbank which are hsted m ^Jj^^Smi 
u«age was calculated using 53 monocot nuclear gene cccing sequences obtained from Genoank ana 

" ^SSl^S. » ^ express in a host cel. it is desiraoie to design the gene such 
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■*2t its f *ecuency of codon usage accroaches frecuercy of oreferrea zocon usage of :re nost csii. 

The oercsnt oevtaticn of tne frequency of preferred ;cdcr usage : or a synthetic -,ere 'torn that 
■jTicicyed by 3 host cstl is calculated : i.-st by ae*.ermimng the percent aeration of the irectency of usage of 
i 3:ncte ccacn Tom ;hat or tne nost cei! fo. lowed by obtaining the average deviation over at! -sdons. As 
-afirec herein ms calculation includes umaue codons ;i.e.. ATG anc TGG). The frectency of offered 
;30on usage of :he synthetic Btt gene, wnose secuence is given m Figure I. is given <n Tabie 1. T he 
■Vecuency cf preferred usage of the ccdon 'GTA' for valine in the syntnenc gene (0.10) deviates from that 
oreferrea oy dicots 2) by 0.02'0.12 * 0 167 or 16.7% The average deviation over aJi ammo aad 
•:odons c: the Btt synthetic gene coder, usage from that of dicot plants is 7.5%. In general terms the overaJI 
average deviation of the cccon usage of a syntnetic gene from that of a host cell is calculated using the 
equation 



n-l-Z X n 



x 100 



*rere X ft = frequency of usage for ccdon n in the host cell: Y ft = frequency of usage for codon n in the 
2G synthetic gene. Where n represents an individual codon that specifies ar. amino aad. the total number of 
codons is Z. which in the preferred embodiment is 61. The overall deviation of the frequency of codon 
usage for all amino acids should preferably be less than about 25%. and more preferably less than aoout 
10%. 

Derived from is used to mean taken, obtained, received, traced, replicated cr descended from a source 

25 (chemicai and/or biological). A derivative may be produced by chemical or biological manipulation 
(including but not limited to substitution, addition, insertion, deletion, extraction, isolation, mutation and 
replication* of the original source. 

Chemically synthesized , as related to a sequence of ONA. means mat the component nucleotides were 
assembiec in vitro . Manual chemical synthesis of ONA may be accomplished using well established 

J0 procedures (Caruthers. M. (1983) in Methodology of ONA and RNA Sequencing . Weissman (ed.), Praeger 
Publishers. New York. Chapter 1), or automated chemicai synthesis can oe performed using one of a 
number of commercially available machines. 

The term, designed to be highly expressed as used herein refers to a level of expression of a designed 
gene wherein the amount ofTts specific mRNA transcripts produced is sufficient to be quantified in Northern 

35 blots and. thus, represents a level of specific mRNA expressed corresponding to greater than or equal to 
approximately 0.001 % of the poiy(A)+ mRNA. To date, natural Bt genes are transcribed at a level wherein 
the amount of specific mRNA produced is insufficient to be estimated using the Northern blot technique. 
However, in the present invention, transcription of a synthetic St gene cesigned to be highly expressed not 
only allows quantification of the specific mRNA transcriptT produced but also results in enhanced 

43 expression of the translation product which <s measured in insecticidaJ bicassays. 

Crystal protein or insecticidaJ crystal protein or crystal toxin refers to the major protein component of 
the parasporal crystals formed in strains of Bt This protein component exhibits selective pathogenicity to 
different soedes of insects. The molecular size of the major protein isolated from parasporal crystals varies 
depending on the strain of St from which it is denved. Crystal proteins having molecular weights of 

45 approximately 132. 65. and 28kDa have been reported. It nas been shown that the approximately t 32 kOa 
protein is a protoxin that is cleaved to form an approximately 65 kOa toxin. 

The crystal protein gene refers to the ONA sequence encoding me insecticidaJ crystal protein in either 
full length protoxin or toxin form, depending on the strain of Bt from which the gene is denved. 

The authors of this invention observed that expression In plants of Bt crystal protein mflNA occurs at 

SQ levels that are not routinely detectable in Norhem blots and that low levels of Bt crystal protein expression 
correspond to -his fow level of mRNA expression. It is preferred for exploitation of these genes as potential 
biocontrol methods that the level of expression of Bt genes in plant cells be improved arc that the stability 
cf Bt mRNA in plants be optimized. This will aitow~greater levels of Bt mRNA to accumulate and will result 
in an irerease in the amount of inseocidal protein m plant tissues. This is essential 'or the control of 

55 insects that are relatively resistant to Bt protein. 

Thus, this invention is based on the recognition that expression levels of desired, recombinant 
msecticical protein in transgenic plants can oe improved via increased expression of stabilized mRNA 
transcripts: and that, conversely, detection of ihese stabilized RNA transcripts may be utilized to measure 
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expression ol transiaticnai :r:3uct (proteim. This invention provides a means of .-esciv.ng me problem of 
low exoression of rsecticidai protein SNA in plants anc. therefore, of low protein exoression mrough the 
use cf an irrxroved. syrtnetic gene specifying an msecaaaal crystal protein from 2t. 

Aremots '0 improve na levels of exoression of 3t genes m plants have centerec on corr.oarative 
stuoies evaluating oarameters sucr as gene type, gene length, choice ;f promoters. acCit.cn cf plant mrai 
u ntr3ns:at»o SNA leacer. sedition cf rtron sequence and modification of nucleotides surrounding the 
•mtiation AT3 coccn. To date, changes in these parameters have not led to significant ennancement of St 
arot-in expression m piants. Aoplicants find that, surprisingly, to express Bt proteins at the aes.red .evel in 
-lants modifications in the ceding region of tha gene were affective. StrucxraMunction relationships car. be 
stucied cs.ng site-specific mutagenesis Oy replacement of restriction fragments v,,th symheuc ONA 
duoaxes containing the desired nucieotide changes <Lo at al. (1984) Proc. Nac. Acad. Sci. 81*285-2289). 
However recent advances in recombinant DNA aschnclciV now make it feasible to cnem.cally synthesize 
an entire gene designed soecificaily for a desired function. Thus, the Btt cooing region was cnem.cally 
synthe«iz°d. modified in such a way as to improve its exoression .n plants. Also, gene synthesis prov.des 
the opportunity to design the gene so as to facilitate its subsequent mutagenesis Oy incorporating a number 
ol aooropnately positioned restnetion endonuclease sites into the gene. 

The present invenaon provdes a synthetic 3t gene for a crystal protein toxic to an insect. As 
exemplified herein. :his protein is toxic to coleopteran insects. To me end of improving expression of this 
insecticidal protein in plants. *is invention provides a ONA segment homologous to a Btt structural gene 
and as exemplified herein, having approximately 85% homoicgy to trie Btt structural gene m P 544Pst-MetS 
in this embodiment the structural gene encoding a Btt insecticidal protein is obtained through chemical 
synthesis of the cocing region. A chemically synthesized gene is used in this embodiment because rt best 
allows for easy and efficacious accommodation of modifications in nucleotice sequences required to 

achieve :moroved levels of cross-expression. 

Today in general, chemical synthesis is a preferred method to obtain a desired modified gene. 
However to date, no plant protein gene has been chemically synthesized nor has any synthetic gene for a 
bacterial protein teen expressed in plants. In mis invention, fte approach adopted for synthesizing the gene 
consists of cesigning an improved nucleotide sequence for the cooing regwn and assembling the gene 
from chemically synthesized oligonucleotide segments. In designing me gene, the codng n>g.on of me 
nlraily-occurring gene, preferao.y from me Btt subclone. p 5 44Pst-MetS. encoding a o, kO.oe4yp.Dbda 
having coleoperan toxidty. is scanned for possible modifications which wou.d result ,n .mproved exores*on 
of the .synthetic gene in plants. For example, to optimize me efficiency cf translation, codons preferred .n 
hichlv expressed proteins of the host cell are utilized. 

B.£ m codon ^.oice within genes in a s*gte species appears related to the .eve. of express** iof the 
protein encoded by mat gene. Codon. bias is most extreme in rtghly expressed proteins of E. coh , and 

m mese organisms a strong positive correlation has been reported between m < 
isoaccepting tflNA species and the favored synonymous codon. in one group cf highly ^™*P^£ 
in yeast over 96% of me amino acids are encoced by only 25 of me 61 available codons (Bennetzen and 

x^d yeaTgen. PGK results in a decreased leve. of bom protein and mflNA. 
oS^odon choice in highly expressed genes enhances translation and is required tor nwmm mRNA 
stability in yeast Without doubt me degree of codon bias .. an :moonant factor to consider when 
engineering high expression of heterologous genes in yeast and other systems. indicatfld mat in 

Experiment evidence obtained from point mutation, and deletion analysis has 
eukaryctic genes specific sequences are associated with PO»«« cri ^ K ^ , ^ R ^^Sc 
Son franslational termination, intron splicing and me like. These are preferably employed ,n 
genes " this invention. In designing a bactena. gene for expression in plants, sequences which mterfere 
with me efficacy of gene exoression are eliminated. 

,n design^ a ?yn,netic gene, modifications in nucleotide sequence of the i »*« £j S 

modify the A * T content in ONA base composition of me synthetic ^ » ^ J^^* "Sirfc 
genes for hign.y expressed proteins native to the host cell. Preferably me A*T content of me synthetic 
ene is JLLJ equ* to that of said genes for W^^JJ'Sl'S 
'xoressed piant oroteins. the A ♦ T content s approximately 55%. It is P^*"^ •« *• « Trna and 
ave an A*T content near this value, and ,ot sufficiently high as to °' ™* Jj 

therefore, lower the orotem expression leve.s. More preferably, me A ♦ T content is no more man about 
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3rc most creferabiy is aocut 55 c v Aiso. for ultimate exaressicn m plants, tne synmetic ^ene rucieonae 
iecuence is oreferaby modified tc form a 3:ant nrjanon secuence at tne 5 end of :he ceding region. In 
action, particular attention s crareraoiy given to assure if at unicue restriction sites are placed n strategic 
positions :c allow effiaert assembly ot oligonudectiCe segments during construction of ;he syrtr.ettc gsne 

= ire to facilitate subsequent nuciectice modification. As a resu.t of these modifications :n coc:rg region of 
T.e ranve 8t gene, tne preferred synthetic gene is exoressed in olams a: an aniancsd leve: ween 
ccm oared tdlrat observed with raturai 3t structural ger.es. 

in specific emocciments. -he synthetic 6t gene of this nvention encodes a Btt protein toxic to 
ccieooteran insects. P'eferaoiy. :he toxic polypeptide is about 598 amino actds m lengtn. ts at feast 75% 

io i-cmoicgous to a Btt polypeptide, and. as exemplified herein. ;s essentially identical to the protein encoded 
by p5«i-*Pst-MetS,"except for replacement of threonine by alanine at residue 2. This ammo acid substitution 
results as a conseauence of tne necessity to introduce a guanine base at position + 4 in the ceding 
secuence. 

in designing the synthetic gene of this invention, the coding region from the Btt succlone. p544Pst- 

;s MetS. encccing a 55 kDa polypeptide having coleopteran toxicity, is scanned for possible modifications 
which wouic result in improved expression of the synthetic gene in plants. For example, in preferred 
embodiments, the synthetic msecticdal protein is strongly expressed m dicot plants, e.g.. tocacco. tomato, 
conon. etc.. and hence, a synthetic gene uncer these conditions is designed to incorporate to advantage 
cedons used preferentially by highly expressed dicot proteins, in embodiments where enhanced expression 

20 of msecticidai protein is desired in a monocot. codons preferred by highly expressed rr.onocot proteins 
(given in Table i ) are employed in designing the synthetic gene. 

In general, genes within a taxonomic group exhibit similarities in coden choice, regardless of the 
function of these genes. Thus an estimate of the overall use of the genetic code by a taxonomic group can 
be obtained by summing codon frequencies of all its sequenced genes. This species-specific codon choice 

25 is reported in this invention from analysis of 208 plant genes. Both monocot and dicot plants are analyzed 
individually to determine whether these broader taxenomic groups are characterized by different patterns of 
synonymous codon preference. Tne 208 plant genes included in the codon analysis code for proteins 
r.aving a wide range of functions and they represent 6 monocot and 36 dicot species. These proteins are 
present in different plant tissues at varying levels of expression. 

jo in this invention it is shown that the relative use of synonymous codons differs between me monocots 
end the dicots. In general, the most important factor in discriminating between monocot and dicot patterns 
of codon usage is the percentage G*C content of the degenerate third base, in monocots. 16 of 18 amino 
acids favor G*C in this position, while dicots only favor G + C in 7 of 18 amino acids. 

The G ending codons for Thr. Pro. Ala and Ser are avoided in both monocots and dicots because they 

35 contain C in codon position II. The CG dinudeotide is strongly avoided in plants (Boudraa (1987) Genet 
Sei. Evol. 19:143-154) and other eukaryotes {Grantham ef aJ. (1985) Bull. Inst Pasteur 8356-148). possibly 
due to regulation involving methytation. in dicots, XCG is always the least favored codon, while in monocots 
this is not the case. The doublet TA is also avoideo in coden positions II and III in most eukaryotes, and this 
is true of both monocots and dicots. 

jo Grantham and colleagues (1986) Oxford Surveys in Evol. Biol. 3:48-81 have developed two codon 
choice indices to quantify CG and TA doublet avoidance in cooon positions H and ill. XCG/XCC is the ratio 
of codons having C as base II of G-«nding to C-ending triolets, while XTA/XTTT is the ratio cf A-ending to T- 
ending triplets with T as the second base. These indices have been calculated for the plant data in this 
paper (Table 2) and support the conclusion that monocot and dicot species differ in their use of these 

45 dinucteotides. 



so 
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Table 2 



Avoidance of :C arc TA iouciets n coders cos.oon il-.ll. 



XC3/XCC arc XTA#XAA values are -nultipHec oy iQO- 



Grcup 


Plans 


Oiccts 


Monocots 


Maize 


Soyoean 


RuBPC 
SSU 


CAB 1 

i 
1 


XCG/XCC 


40 


30 


51 


67 


37 


18 


22 


XTA.XTT 


37 


35 


47 


43 


41 


9 


13 


RuBPC SSt 


j = ributo 


se i 5 bisphcsphate small subumt 


CA3 ■ chlorophyll a/b binding protein 



20 



13 



™a mai?a soecies-soecific codon usage profiles were calculated 

;s » - - » - • — 

smaller portion of the entire dtcot sample. avnr ««d aenes such as the ribulose 1.5 

bisphosphate small suourut (Ru3PC sauj ana « *f aenes (19 and 17 sequences, 

mat of plant gene, in generai. codon usage pn> CAB 5 Zc £»n characterized by 
respect^, were ca.cu.at* (no, ' « 'dice* samp.es .Tab* 2). 

SLthatofthemonco^ Qf scecl9S , pecific 

The use of peeled data for *gMy 2™*^^^ aene8 te RuBPC SSU and CAB were 
patterns in cocon choice. Therefore the ^J^J^T^l, RuB PC SSU and CAB are more 
Sabulated. The preferred ccdons of the J ^J^S with Matsuoka et ai. (1987) J. 
restricted in general man are those of the dwot spwe* tw a ^n ag S well as two 

Biccr-.em. ,02:673-676) who noted the extreme ^^^^^^o^. These genes 
other high-expressed genes in ma,« i leave . «■ codon Was was no, as 

almost completely avoid the use of A*T .n codon jDosrdon ... ■ g£ jyntheOS8 
pronounced in non-.eaf proteins such as ^^^^^^t a simi.ar pattern of codon 
and ATP/AOP translocate Since the wheat SSU; w» CAfl ' ? en " ^ gene8 in leaves. The 

preference, this may ^££££2 ?5£S£ 31 .*n* eLme preference for 
CAB gene for Lemna and the RuBPC : SSU genes ' 0 '^"* n baS88 preferred b* some 

synonymous codons (e.g.. GCT tot A£ CTT ^ «J ^ .„ monoc0 „. 

preference is less pronounced for ooth "«BPCS8" ^.nt, attempts are also made to eliminate sequences 

tc eliminate potentially deleterious sequences. 



so 



plant coding region. Since a* i-ncn '^"iT " hftti " Btt 

,. gi0 „,-^ <«»»• « »; -srvss .•-i-JS sr. ST"" - 
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.-oweve'. I[ appears chat ecuaMy s:ao)e -ransc-icts sre ec:anea i n :re acsirce or iris $x:enson pcypecnce 
:cr.taminq mrty-mne nucectices. 

Not ah cf ;r.e acove-nemionec rnoairicaticns of me natural St gene must ce mace n constructing a 
s/ntnetic 2t *ere ir orcer :c obtain enrarcec expression, for 3xamcie. 3 synthetic gere rray ce 

z syntnesizeo"for ether curccses m acdtticn to that of acnievmg enhanced levels ci expression 'Jraer these 
*oncitions. :he original sequence oi the natural 3: gene may be preserved A-i:hm a region of ONA 
:crresponcmg ;o one or ncre. cut not ait. segments used :o construct :ne syrtnetic gene. Oecenaing on 
the desired purpose :f the gene, modification may encompass suostitution of one or more, out not all. cf 
tne oiigonucieotice segments used to construct trie syntneac gene Oy a corresponding region of natural 6t 

to sequence. 

As is known to tnese skilled m the an cf synthesizing genes (Marcecki et al. 1 1 935) Proc. Natl. Acao. 
Sci. 32:3543-3547; reretti et al. (1986) Prcc. Nab. Acad. Sci. 83:599-5037 The DNA sequence to be 
synthesized is divided mtc segment lengths ^hich can oe synthesized conveniently ano without undue 
;cmoii cation. As exemplifiec herein, m precanng to synthesize the 8tt gene, the coding region is divided 

'5 mto thirteen segrrents (A - M). Eacn segment has umcue restriction sequences at the conesive encs. 
Segment A. for example, is 228 base pairs in length and is constructed from six oligonucleotide sections, 
eacn containing approximately 75 bases. Single- stranded Oligonucleotides are anneaied ano ligaied to form 
GNA segments. The length of the protruding cohesive ends in complementary oiigonucieotice segments is 
four to five residues, in the strategy evoked for gene synthesis. ;he sites designed for the joining of 

20 oligonucleotide pieces and ONA segments are different from the restriction sites created in the gene. 

in me specific embodiment, each DNA segment is cloned into a plC-20 vector for amplification of the 
ONA. The nucleotide sequence of eacn fragment is determined at this stage by the dideoxy method using 
tne recomoinant phage ONA as templates and selected synthetic oligonucleotides as primers. 

As exemplified herein and illustrated schematically in Figures 3 and 4. each segment individually (e.g., 

25 segment M> is excised at :he flanking restriction sites from its cloning vector and spliced :nto ;he vector 
containing segment A. Most often, segments are added as a paired segment instead of as a single segment 
to increase efficiency. Thus, the entire gene is constructed in the original piasmia harbonng segment A. The 
nucleotide sequence of the entire gene is determined and found to correspond exactly to that shown in 
Figure 1. 

30 In preferred embodiments the synthetic Stt gene is expressed in plants at an enhanced level when 
compared to that observed with natural Stt structural genes. To that end. the synthetic structural gene is 
combined with a promoter functional in plants, the structural gene and the promoter region being in such 
position and orientation with respect to each other that the structural gene can be expressed in a cell in 
wnicn the promoter region is active, thereby forming a functional gene. The promoter regions include, but 

js are not limited to. bacterial and plant promoter regions. To express the promoter region/structural gene 
combination, the ONA segment carrying the combination is contained by a cell. Combinations which include 
plant promoter regions are contained by plant ceils, whicn, in turn, may be contained by plants or seeds. 
Combinations which include bacterial promoter regions are contained by bactena. e.g.. 8t orE. ecli. Those 
in the art will recognize that expression in types of micro-organisms other than bacteria may in some 

40 circumstances be desirable and, given the present disclosure, feasible without uncue experimentation. 

The recombinant ONA molecule carrying a synthetic structural gene under oromoter control can be 
introduced into plan tissue by any means known to those skilled in the art. The technique used for a given 
piant species or specific type of plant tissue depends on the known successful techniques. As novel means 
are developed for tne stable insertion of foreign genes into plant cells and for manipulating the modified 

45 cells, skilled artisans will be able to select from known means to achieve a desired result Means for 
introducing recombinant ONA into plant tissue incude, but are not limitec to. direct ONA uptake 
(Paszkowski. J. et al. (1984) EMBO J- 3:2717), electroporation (Framnv.M. et al. (1985) Proc. Natl. Acad. 
Sci. USA 82:5824)7 microinjection (Crossway. A. at al. (1986) Mol. Gen~Genet 202:179). or T-ONA 
mediated transfer from Agrobacterium tumefadens to the plant tissue. There appears to be no fundamental 

so limitation of T-ONA transformation to the natural host range of Agrobacterium . Successful T-ONA-mediated 
transformation of monccots (Hooykaas-Van Stogteren, G. et al. (1984) Nature 312:763). gymnosperm 
(Oandekar. A. et ai. (1987) Biotechnology 5:587) and algae (Ausich. R.. EPO apciication 108.580) has been 
reported. Representative T-ONA vector systems are describee in the following references: An, G. et al. 
(1985) EMBO J. 4:277; Herrera-Estrella. L et al. (1983) Nature 303209: Herrera-Estrella. U et ai. (1983) 

55 EMBO J. 2:987; Herrera-Estrella. L. et al. Ti985) in Plant Genetic Engineering. New York:"Cambndge 
University Press, p. 63. Once introduced into the plant tissue, the expression of the structural gene may be 
assayed by any means known to the art. and expression may be measured as mRNA transcribed or as 
protein synthesized. Techniques are known for the m vitro culture of plant tissue, and in a nurrber of cases. 
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for regeneranon nta -hole plants. Procedures fcr transferring me introcucec agression -no.ex » 
.-ommorciaiiy useful cUtr/ars are Known to :l"cse sailed in the art. 

in ere rf lts preferred smoodiments me .rvemion iisccsed herein corr.cnses expression .n piant ceils 
of a synthetic .nsecacical structural gene uncer ccr.troi of a plant expressible promoter, ciat .s :o say. by 
r.«n.ng r.e msecnace structural gens intc T-ONA under control of a clan: express** prcnoter ar.a 
r.'rc-Lcnc ;re T-CNA ccr.taining the insert in:o a plant cell using known .rears. Ones plant cells 
-tore-anc s synthetic .nsecticidal structural gere under control of a giant express.ole premcter are 
.', cl a.neo plant -issues and whole plants can pe regenerated therefrom using methods and tec.m.gues well- 
known m .he art The regenerated plants are then reproduced by conventional means ano t.e mtroducec 
acnes car. oe transferred to other strains ana cultivars by conventional plant breeding teenmques. 

The introducaon and expression of the synthetic structural gene fcr an insecticidal protein can be usee 
to cro'ect a "op from infestation with common insect pests. Other uses of the invention, exploiting the 
orcoerties of ether insecticide structure genes introduced into other plant species «.ll be raadHy apparent 
to those stalled in the an. The invention in princip.e applies to introduction of any syntheuc msecnace 
structura. gene into any plant species into wh.ch foreign ONA (in me preferred embodiment T-ONA) can be 
mrccuced ano in which said ONA can remain stably replicated. In general, these taxa present* mcude. but 
are nolm^d to. gymnosperms and dicotyledonous plants, such as sunflower (family CcmposUeae . 
Xtacc ' (farSy ScUnaceae). alfalfa, soybean, and other legume, (family Legum.noseae). cctton family 
SaVeae^"^ most vegetables, as we., as monoccty.edonous plants. A plant contammg « ,ls tissues 
increased ^ of insecticidal protein will control less susceptible types of insect thus providing advantage 
ov"f presenMnsLctCa. uses of St. By incorporate of the insecticidal prcte.n into the tissue, of a plant 
7e ^Z n^T^cr^o^ advantage over present uses of insecticides by e.im.naong 
nstanceTo! S£m application and me costs of buying and applying insecticidal preparation, to a field. 
Also ti Isem lotion eliminates the need for careful timing of application of such preparaoons ,,nce 
JrSS^^T^ » in»ectJcida. protein and me protein is aWays present m.n.mung crop 
damage mat would omerwise result from preapplteatjon larval foraging. tarhnintltl . 

Si invention combines me specific teachings of me present disclosure w,m a vans* of tM*z 
and expedients known in the art The choice of expedients depends on variables sucn as the cno ce o 
SJZZZ«™« a Bt strain, the extent of mocification in preferred ccdon usage, man.pu.at.on of 
msecticidaJ prawn jrom a R sequences prematurely terminating transcr.poon. 

addition of intron, or enhancer sequences to me 5 and/or 3 ends of the synthetic ^j"* 9 C ™- *• 
oram«er reg.cn. me host in which a promoter region/structural gene combination ,s «P™"*»< * e 
As nove. inlcticida. proteins and toxic polypeptides are discovered ^^ZSTSiX 
enhanced cross-expression (expression of a foreign structural gene " 1 «^."^^ *^^STi 
ordinary skill will be able to select among those elements to produce improved synthe,, ^ genes tor 
rJirad proteins having agronomic value. The fundamental aspect of me present invention .s me ability to 
tSS^TZX^Si 'or an insecticida. protein, designed so mat the protein will be express* at 
renhanc^l 5 plant,. Jet so that it will retain its inherent property of insect toxicity and reta,n or 
increase its specific insecticidal activity. > 



EXAMPLES 

The followmg Examples are presented as iHustraticns of embodiments of me present invent.cn. They do 
^^^'ZX^'^"^^, .ormern Heg.on* Besearch 
Center. '815 N. University Street Peoria. Illinois 61604. 



Strain 


(Deposited on 


Accession # \ 


E. COli MC1061 (p544-Hindll1) 
T.coU MC1061 (p544Pst-MetS) 


6 Octcoer 1987 
6 Octccer 1987 


NRRLB- 18257 I 
NRRLB-18258 



me deposited s*a*s are prov-ced for the convince of those ir , me £^1™™%^* 
3rac.ee the present invention, which may be praeaced ^^^^^^^^ 
publicly availaole protocols, information, and matenais. E. cch MC1061. a goco nost p 
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•ions, .-/as :iscicsed ty Casacaoan. V1.J. arc Ccnen. S.N. ('980: J .Vol. Sici. 1 33:: 79-207 



E.^ar-c'e i Cesicn ot the svn:hetic msectiocui crystal protein cere. 



Preparation of to*ic subclones of the BP gene 

Construction, isolation, and characterization of pNSB5*u is disclosed by Sekar. V. at al. (1937) Proc. 
Natl. Acad. Set. USA 84:7036-7040, and Sckar. V. and Adang. M.J.. U.S. patent acpTcation serial nc. 
108.285. filed October 13. 1987. wmch is hereby incorporatec by reference. A 3.0 kbp Hindlll fragment 
carrying ;hc cr/stal protein gene of pNSBP544 is inserted into the Hindlll site cf DIC-20H (Marsh. J.L ct ai. 
0 984) Gene 32:481-485), - reby yielding a plasmid designated p5*^- Hind lll. which is on decosTt. 
Expression in E. eeli yields a 73 kOa crystal protein in accition to the 65 kOa species cha/actenstic of the 
crystal protein obtained from 8tt isolates. 

A 5.9 kbp Bam HI fragmeru"carrying the crystai protein gene is removed frcm pNS8P544 and inserted 
-nto 3am HMineanzed piC-20H ON A. The resulting plasmid. p405/44-7, is digested with Bglll and religated. 
■hereby removing Bacillus sequences flanking the 3 -end of the crystal protein gene. TheTesuiting plasmid. 
34CS54-12. is digested with Pstl and refigated. thereby removing Bacillus seauences flanking the 5 -end of 
the crystal protein ana abouMSO bp from the 5 -end of the crystal protein structural gene. The resulting 
olasmid. 0405/81-4. is digested with Sphl and Pstl and is mixec with and ligated to a synthetic 'inker having 
the following structure: 

SO MetThrAla 
5 • CAGGATCCAACAATGACTGCA3 1 
3 • GTACGTCCTAGGTTGTTACTG5 • 
Soh l £S£I 

(SC indicates trie location of a Shine-Oalgarno prokaryorJc ribosome binding site.) The resulting plasmid. 
p544Pst-Met5. contains a structural gene encoding a protein identical to one encoded by pNSBP544 except 
for a celetion of the amino*terminal 47 amino acid residues. The nucleotide sequence of the Btt coding 
-egion in p544Pst-MetS is presented in Figure 1. In bioassays (Sekar and Adang, U.S. patent application 
serial no. 108.285. supra ), the proteins encoded by the full-length 8tt gene in pNS8P544 and the N-terminai 
deletion derivative, p544Pst-MetS. were shown to be equally toxtc. All of the piasmids mentioned above 
nave their crystal protein genes in the same orientation as the lacZ gene of the vector. 



W Modification of preferred codon usage 

Table i presents the frequency of codon usage for (A) dicot proteins. (B) 8t proteins. (C) the synthetic 
Btt gene, and (O) monocot proteins. Although some codons for a particular amino acid are utilized to 
approximately the same extent by both dicot and Bt proteins (e.g., the codons for serine), for the most part 
the distribution of codon frequency varies significantly between dicot and Bt proteins, as illustrated in 
columns A and 8 in Table 1 . 
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rabla 1. Frequency of Codon Usage 



Distribution Fraction 



Ami no (A)Dicot (B)fit (C) Synthetic (D)Monocot 

Genes Btt Gene Genes 



0.08 0.13 0.21 

0.53 0.37 0.13 

0.24 0.34 0.21 

0.16 0.16 0.40 

0.13 0.52 0.77 

0.87 0.48 0.23 

0.68 0.56 0.31 

0.32 0.44 0.69 

0.15 0.30 0.38 

0.32 0.10 0.07 

0.29 0.35 0.20 

0.24 0.25 0.34 

0.12 0.06 0.20 

0.50 0.24 0.16 

0.32 0.41 0*28 

0.06 0.29 0.36 

0.13 0.58 0.87 

0.87 0.42 0.13 

0.79 0.44 0.23 

0.21 0.56 0.77 

l.oo i.oo i.oo 

0.30 0.20 0.09 

0.57 0.43 0.27 

0.13 0.37 0.^4 



Acid 
Gly 


Ccdon 




GGG 


0.12 


Gly 


GGA 


0.37 


Gly 


GGT 




Gly 


GGC 


0.16 


Glu 


GAG 


0.52 


Glu 


GAA 


0.43 


Asp 


GAT 


0.3/ 


Asp 


GAC 


0.43 


Val 


GTG 


0.30 


Val 


GTA 


0. 12 


Val 


GTT 


O.Jo 


Val 


GTC 


0.20 


Ala 


GCG 


0.05 


Ala 


GCA 


0.26 


Ala 


GCT 


0.42 


Ala 


GCC 


0.28 


LVS 


AAG 


0.61 


Lys 


AAA 


0.39 


Asn 


AAT 


0 • 49 


Asn 


AAC 


0.55 


Met 


ATG 


1.00 


He 


ATA 


0.19 


He 


ATT 


0.44 


lie 


ATC 


0.36 


Thr 


ACG 


0.07 


Thr 


ACA 


0.27 


Thr 


ACT 


0.36 


Thr 


ACC 


0.31 


Trp 


TGG 


1.00 


End 


TGA 


0.46 


Cys 


TGT 


0.43 


Cys 


TGC 


0.57 


End 


TAG 


0.18 


End 


TAA 


0.37 


Tyr 


TAT 


0.42 


Tyr 


TAC 


0.58 



0.14 



0,07 0.18 

0.68 0.27 0.14 

0.14 0.34 0.22 

0.05 0.32 0.47 

1.00 LOO 1.00 

0.00 
0.33 
0.67 

0.00 
1.00 
0.81 
0.19 



0.00 0.34 

0.33 0.27 

0.67 0.73 

0.00 0.44 

1.00 °- 22 

0.43 0.19 

0.57 0.81 
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Table 1 (CONTINUED) 









Distr i 


but icn Frac. ion 




Aiaino 




(A) Dicot 


(B)B£ 


(C) Synthetic 


(D)Monocot 


Acid 


Codon 


Genes 


Genes 


Btt Gene 


Genes 


?he 


TTT 


0,45 


0.75 


0.44 


0.28 


Phe 


TTC 


0 . 55 


U . 


U * 3Q 


U . / £ 


Ser 


AGT 


0.14 


0.25 


0. 13 


0.07 


Ser 


AGC 


0. 18 


0.13 


0.19 


0.25 


Ser 


TCG 


0.05 


0.08 


0.06 


0.13 


Ser 


TCA 


0.18 


0-19 


0.17 


0. 13 


Ser 


TCT 


0.26 


0.25 


0 . 27 


0 • IS 


Ser 


TCC 


0. 19 


0.10 


0.17 


0.24 


Arc 


AGG 


0.22 


0.09 


0.23 


0.28 


Arcr 


AGA 


0.31 


0.50 


0.32 


0.08 


Arcr 


CGG 


0. 04 


0.14 


0.05 


0.14 


Arg 


CGA 


0.09 


0.14 


0.09 


0.04 


Arg 


CGT 


0.23 


0.09 


0.23 


0.11 




CGC 


0. 11 


0.05 


0.09 


0.36 


Gin 


CAG 


0.38 


0. 18 


0.39 


0.43 


Gin 


CAA 


0.62 


0.82 


0.61 


0.57 


His 


CAT 


0.52 


0.90 


0.50 


0.38 


His 


CAC 


0.48 


0.10 


0.50 


0.62 


Leu 


TTG 


0.26 


0.08 


0.27 


0.15 


Leu 


TTA 


0.10 


0.46 


0.12 


0.04 


Leu 


CTG 


0.09 


0.04 


0.10 


0.27 


Leu 


CTA 


0.08 


0.21 


0.10 


0.11 


Leu 


CTT 


0.29 


0.15 


0.18 


0.16 


Leu 


CTC 


0.19 


0.06 


0.22 


0.27 


Pro 


CCG 


0.07 


0.20 


0.08 


0.20 


Pro 


CCA 


0.44 


0.56 


0.44 


0.39 


Pro 


CCT 


0.32 


0.24 


0.32 


0.19 


Pro 


CCC 


0.16 


0.00 


0.16 


0.22 



fit coding sequences publicly available and 38 coding 
sequences of dicot nuclear genes were used to compile the 
codon usage table. The pooled dicot coding sequences, 
obtained from Genbank, were: 
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CEKUS/Si'ECIIuS 



i'KOTON 



Antirrhinum fnajuj 
ArabidaptU thciicm 



20 



25 



20 



35 



40 



Bcnhoiieuc acetic 
Qraitica campaais 
Hrasuca ncput 
0 rast tea oirocea 
Ca^mc/iO tnttfoemts 

Chlamdomo*as 



GfCtuna xornta 



DcUeitcs difhna 



50 



55 



AMACUS 

aTIUUCA 

\THU3C3 

ATUH4CA 

ATItUlCn 

ATIfTUBA 



nOLSLSCK 
CENCONA 
CTA pap 

CR£C35I 

CR£K8CSl 

CR£RBCSZ 

cucpirr 

CUSGMS 

CUSLHCPA 

CUSSSU 

oarcct 

D81LEC3 
FTR8CX 
SOY7SAA 
SO YACnC 

soTcnn 

SOYGtYAlA 
SOYGLYAAB 
SOYCLYAB 
SOYCLYU 

SOYHSM75 

SOYLCTl 

SOYIXA 

SOYUOX 

SOYNO020C 

SOYNO023C 

SOYNO0HH 

SOYKO0U0 

SOVNOOttR 

SOYKO027R 

SOYKOOJSM 

SOYNO07S 

SOYTOORt 

SOYNO0R2 

SOYTOl 

SOYRUBP 

SOYURA 

SOYttSPldA 



Ciatcoac rynihcuse 
Alcohol dehydrogenase 
Itixtonc 3 gene I 
Ketone 3 gene 2 
i (istone 4 Qcnc t 
OG 
a tubulin 

5<.iolpyruvyt«hifaic 3-ph05ptu:c 
:ynthcu&e 

High methionine nora$e p rot cm 
Acyt carrier protein 

S-torus tpetific jt>ropn>iet» 

Cancan jvslm A 

P»ram 

Pre a poeytochrome 
RuUTC small subvnit gene 1 
RuOPC small lubunit gtne 2 
Photochrome 

Glyotsaomal malm rymhcuie 
CAO 

RuQPC uoall subvoil 
Cxtensia 

33 kO exuasia related proteio 
teed tenia 

RuOTC iratfl w&w* 
7S aocmfc piweia 
ACliB 1 

ai pfoca 

Clyctftki AUBsc 
Ctyrau* AiA*Q3 flAvais 
Cl|Cmtfl A3/W wboortx 
Glycsftifl A20U tabttnits 
Low M W heat ttartc protein* 
Lt$hemctfoo* 
Lectin 

Linorycenaac 1 
JOkOa nodutm 

23 kOa nodwtin 

24 kOa «od»*i» 
26 kOa nodufa 

26 kDa nodulia 

27 kOa nod«4ia 
35 kDa nod*** 
73 kOa Aodutta 
Noddin Ol 
NodvtUi EZ7 
PreHiM net) protem 
RvOPCwnaUiwbwnit 
Urease 

Heat chert protein 26A 
NucUar-encoccd chloropUst 
heat snort ptcxcui 
22 kDa nodulia 
3\ tubulin 
ffl tubulin 
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Lltf*r-**i luteut 

Lycjp<rticon 
cicuLciuum 



15 



20 



2$ 



M ncrobryartHKm 

ayiuitumm 

Htcoriana 



Nicotk 



35 



fienemie 
Pttmntm tp 



U;tAlU9 

t.UPLBtt 

TOMMIOItR 

TOMETUYBR 

TOMPC2AR 

TOMrSI 

TOMKJJCSa 

TOMRBC50 

T0MRBC5C 

TOMRJJCSD 

TOMRRD 

TONfWIPtC 

T0.\n\ipii 



ALFUJJR 



PO0ATT21 



T00EC1C 

TOCGAPA 

TOOCAPB 

TO OCA PC 

TOSnUAR 

TOOnttCR 

TOBfRTR 

TOOrXDLF 

TOORUPCO 

TOBTIUUR 

aVOCEL 

rttoaiL 

pctcabu 

pctcaoul 

rcrcAOUR 

rcrcABii 

pctcaojt 

PCTCAOftR 

PCTCitsn 

PCTCCRl 

PCTRBCSW 

PCTOCSU 

PHVaiM 

P!IVDL£CA 

niVULECB 

I'tCVCSRJ 

r:i\x:sR2 



; tii'f; • m.-.u mi it. 

.lilii.ntn -Hir-:^ ;tr 

'. .'.vin»;..c<Juf*) .?^(jl3V 

CaO 

t'wQPC snull SuOunn 
I'StiefroglOOin I 

lltona bind protein 
Uihylcac Dtcsyntncsis protein 
Pat vgauciv ro niie -2a 
Tomato ptiotosysicm I proicm 
RuQPC small subunn 
RuQPC ur.ail suountt 
RuOPC small suuumt 
RuOPC snull wound 
Rjpentnf related proicm 
Wound induced protcmuc 
initiator I 

'Voimtf induced proteinase 

inhibitor (I 

O\0 IA 

CAO 10 

CAO 3C 

CAU4 

CAO 3 

Lcjhcmo^Joom (tl 

RuOrC uruU sutmnu 

Mitochondrial ATT ryntftaae 

Nitrite redveuac 
ClutMtac fyotfecuic 



10 
10 
10 
1 1 

M 
12 



13 
14 



A s«o«rut of ctUoropUa G3PD 
0 subwnic of ddotoplaa G3FD 
C tu6«M of ctUoropUs G3PO 
Pathoceacais related proton U 
Paihogeacu-rciafcd proton 1c 
Pathofntwa related prouin lb 
Pcrawdaae 

RuOTC unafl tvovmt 
TMV^nduccd protein homo l ogoy* 

IO tfUWfftJUA 

Cellulate 

CAO 13 

CA0 2ZL 

CA022R 

CAO 2S 

CAO 37 

CA091R 

CfcaJconc tyntruae 

Glycine -rich proicm 

RuOTC small mouait 

RuOTC tnutt cuotieu 

70 kDa heat ehock protein 



15 



Ph^onera agglutinin G 
Phytohem agglutinin L 
Glvununc tymnet i« I 
GUumme synthetase 1 
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r^;.: \ (COM":!: 1 -". 



20 



25 



Rianus cammunii 



Sitae pratnxts 

Sinapa aiba 
Solatium iuocrosum 



J3 



40 



Spt/uaaa otams 



#.:i-:n:».\kk 

hivuxt 
piivtac 
ruvniASAR 
ruvpiusnn 



PEAALD1 

PEACaMO 

PEAGSRi 

PEaUCA 

PEaLECA 

PEARUBPS 

PEA vi CI 

PSA VIC* 

PEAVIO 



RCCaGC 

RCCRICIN 

RCCICW 

SIPTOX 

SIPPCY 

POTTAT 
POTTNKW1 

POTLSlC 

poimc 

POTRBCS 

srucpt 
srtoea* 

snoecxs 

snrcc 
snrss 



VTA LB A 
VTALEB4 



I'henyUtanine lmmoou lyi*C 
a pAucdm 

Afcciin «ed pcotcio 
Chitconc synthase 
Seed Albumin 
CAB 

Cluumioc tyoitwtise (nodule) 

Lectin 

Lejumta 

flu C PC small tuDumi 

Victim 

violin 

Viciiin 

Alcohol deh>drof«use I 
Cluuminc lyntbeuse (leaf) 
Giuuminc tycwfcuM (woi) 
Hittem 1 

Nuclear encoded chJoraplm 
hcu thocfc prouin 
RuOPC«nalii»bwM 

Ricin 

liociuite ty»e 
Pcrrodona piwnor 
ptisjacytnb* pneunor 
NweteMceiwferGSPt) 
Pauxtfl 

Wowad-indMCcd pioumsae 

Lifht.«d«nbte tone sperific 

Wo»nd«<ndoccd pwt crntw 
inttibnor U 
RaOrCcfluOsMbMii 



Mil 



14 kD* 



23kDt 



PUoecyasiA 

obdtuoB eompio pawnor 
Glycoutc oodaat 

Lcjvmin 0 
ViciUia 



!6 
17 



18 
19 
19 
20 
4 

21 



22 



23 
24 



so 



55 
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2 ' (CON? !i:L'."i 



CENUS/srcaES 



CGNIUNK 



TROTHS 



REF 



I fart aim w/£«c 



Triiiatm aer/Awti 



taafroatafe 
Zramoyx 



dlyalr 

BLYAMYl 

0LYAMY2 

OLYCUORDl 

GLYCLUCB 

OLYTlORfl 

0UYTAP1 

OLYUDIQR 



RJCCLLTC 

WHTCAO 

VVHTEMR 

WifTClR 

WWTCUJS 

WHTCUABA 

wirrcum 

WHTIMm 
WHTRBCB 

RYESECCSR 
M2JEAIC 

M2ZKCTIG 
MZEAOHttF 
M2£aDH2NR 
MXEUD 

M7FTC7R 
MZECCSDB 

MIQMCX4 

Miotsrrot 
M7Xitsn« 
stzfijicr 

M7.EMIU 
MZETETCR 
Ml£RflCS 
MZ ESUSY SC 

MZLiru 

MZEZ&UOM 

MZEZ&tMM 

MZEZE1SU 

MZ£Z£M 

MZEZEWa 

M7F7FT1A 

MZEZEZ20 



PiiyuxhfOme ) 
Atcura:n 
a atn^ue 1 
a am ?t lie 2 
Hordem C 

0 gtucanase 

01 hordem 

Amylase/ protease mftibnor 
Ton a a hordoiittorun 
UbiquiUa 
I (atone 3 

Leaf specific ifitocun 1 

Leaf specific thiomn 2 ' 

rtastocytmn 

Glutetm 

CUwim 

a amylase 

CAB 

Cm protein 

pbbcrdlin resporUM protein 
-r jtUdw 

a/0 pixUa CUtt All 
High MW shitcaM 
MiooaO 
liisteac* 
RuOrCsffliU 
itectlia 
40.1 tOAl 
depcadem redaewe) 
Aetia 

Alcohol deitydrofeaase I 
Akohol dcftydxoccoae 2 



25 
26 
26 
27 

28 



(NADPH- 



0«teiM2 

Gluwto* S mmfcTiac 
Uoieae4 

70 tO Heat chocfe prwcm, uoa I 
70 tO Ncai chock proccxa, e*o* 2 
CAO 

Lipid body utrbcv protein L3 

Phospheenctyfvwc carooxyuac 

RuOrCinuafMOWM 

S«ciom fynthctiM 

TnoaephotpMic «omera*e 1 

l9UOxcm 

l«fcOtcm 

15k0xaa 

t6kOtcsa 

19 kD tctn 

22 tl> vtim 

22 WO tein 

Oulase 2 

Rc*uUtoryCt torwa 



29 
30 
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Table L f CONTINUED) 

Dt c-occp.s were obtained ivzm analysis oi ccuinc loquencos 
o 7 the following genes: Si var. Kyrs-2>; HC-73, 0 . <3kfc 
HLr.dlll fragment (Xronstad et (1983) J. Bacterid. 

154:419-428): Bt var. kurstaki HD-1, 5.3 kb fragment (Adang 
eT~al. (1987) in qiofcachnoi oqv in Invertebrate Pathology 
and Cell Culture . X. Maranorosh (ed.), Academic Press, Inc. 
New *ork, pV 8S-99) ; fi£ var. KyrstaK?, KD-l. 4.5 kb 
fragment (Schnepf ar.d whiteiey (1983) J. Biol. Chem. 
260:6273-6280); and Si var. tenebrionis. 3.0 kb Hindlll 
fragment (Selcar et ai- (1987) Proc. Natl. Acad. Sci. 
84.: 7036-7040) . 



1. Klee, H.J. fit ai- (1987) Mol. Gen. Genet. 210 = 437- 
442. 

2. Altenbach, S.3. e£ ai- (1987) Plant. Mol. Biol. 8:239- 
250. 

3. Rose, R.S. ai ai- (1987) Hud. Acids Res. 11:7197. 

4. Vierling, E. ai ai. (1988) EMBO J. 7:S7S-581. 

5. Sandal, M.N. ai ai- (1987) Nucl. Acids Res. 11:1507- 
1519. 

6. Tingey. S.V. si Si. (1987) EMBO J. i:X-9. 

7. Oilan, C.A. a£ ai- («87) Plant Mol. Biol. 2:533-546. 

8. Allen, R.D. ai Ai- (« 87 > Mol « Gen - Genet - 21°- :211 ~ 
218. 

9. Sakajo. S. at ai- (198 ? ) Eur - J ' Biochem. 161:437-442. 

10. Pirersky, E. ai ai- (1987) Plant. Mol. Biol. 2:109-120 

11. Ray, J. £i ai- (1987) Nucl. Acids Res. H:10587. 

12. DeRocjer, E.J • ££ ai. (1987) Mud. Acids Res. U:6301 

13. Calsa, R. ai si- (1987) Mol. Gen. Genet. ifi2:S52-562. 

14. Tingey, S.V. and Coruwi, G.M. (1987) Plant Phys 
ai.: 366-373. 

15. Winter, J. ai ai. (1988) Mol. Gen. Genet. 211:315-319 

16. Osbom, T.C. et ai. (1988) Science 21P.: 207-210. 
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".3. Llewellyn, D.J. e: a 1 . (196 7) j . Mo 1 . 3iol. 195 : 115- 



19. Tingey, S.V. et al. (1987) EMBO J. 5:1-9. 

20. Gantt, J.S. and Key, j.l. (1937) Eur. J. Biochem. 
166:119-125. 

21. Cuidec, F. and Fourcroy, P. (1988) Nucl. Acids Res. 
16:2336. 

22. Salanoubat, M. and Belliard, G. (1987) Gene 60:47-56. 

23. Volckita, M. and Somerville, C.R. (1987) J. 3icl. 
Chera. 262: 15825*15828. 

24. Eassner, R. et 4I. (1987) Nucl. Acids Res- IS : 9609 . 

25. ChojecJci, J. (1986) Carlsberg Res. Commun. 51:211-217. 

26. Bohlmann, H. and Apel, K. (1987) Mol. Gen. Genet. 
207:446-454. 



27. Nielsen, P.S. and Gausing, K. (1987) FEBS Lett. 
225:159-162. 

28. Kiguchi, w. and FuJcazawa, C. (1987) Gene 5^:245-253. 

29. Bethards, L. A. e£ ai, (1987) Proc. Natl. Acad. Sci. 
USA ai:6B30-6834 . 

3C. Paz-Ares, J. al- (1987) EMBO J. 6: 3553-3558. 



For example, cicots utilize me AAG codon for lysine with a frequency of 61% and the AAA codon with a 
frequency of 39%. In contrast in Bt proteins the-lysine codons AAG and AAA are used with a frequency of 
13% and 87%, respectively. It is known in the art that seldom used codons are generally detrimental to that 
system and must be avoided or used judiciously. Thus, in designing a synthetic gene encoding the 8tt 
crystal protein, individual amino acid codons founc in the original Btt gene are altered to reflect the codons 
preferred by dicot genes for a particular amino acid. However, attention is given to maintaining the overall 
distnbutron of codons for each amino acid within the coding region of the gene. For example, in the case of 
alanine. >t can be seen from Tabte i that the codon GCA is used in at proteins with a frequency of 50%. 
whereas the ccdon GCT is the preferred codon in dicot proteins, in designing the synthetic Stt gene, not all 
codons for alanine in the cngmai Bt gene are reolaced by GCT; instead, only some alanine codons are 
changed to GCT while others are replaced with different alanine codons in an attempt to preserve the 
overall distribution of codons for alanine used in dicot proteins. Column C in Table i documents that this 
gcal is achieveo: the frequency of codon usage in dicot proteins (column A) corresponds very closely to 
that used in the synthetic Btt gene (column C). 

in similar manner, a synthetic gene ceding for msecticidal crystal protein can be optimized for 
enhancec exprassion in monocot plants. In Table I , column 0, is presented the frequency of codon usage 
of nignly expressed monocct proteins. 

Because of -he cegenerate nature of the genetic code, only part of the variation contained in a gene is 
expressed in this protein, it is clear that variation between degenerate oase frequencies is not a neutral 
cneromencn sirce systematic ccoon preferences have been reported fcr bacterial, yeast and mammalian 
Seres. Analysis of a large grcup of plant gene sequences indicates that synonymous codons are used 
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ciffe-enny ay ironecais anc dicsts. These caiterns are also cst.nct from those recortec Ser =. csS yeast 

arc man. . 

•r. general the plant ooaon usage saltern more ctesely resembles fret of man ana other n.grer 
e-kvyaws man un.cellular orcan.sms. due to the overall preierenca !cr G -C content m cccon pos.t.cr. m. 
M-naeots in this samoie snare tne most commonly used codon for "3 of 13 ammo acics as rat reoorted 
for' a sample of htman genes .Grantham at a.. (1986 sucrai. although dicots favor the most commonly 
.-(.man ccoon in only 7 of 13 amino ac:as. 

Discussions of slant cocon usage nave focused on the differences between cccon choice m olant 
ruciear qenes and in chicroplasts. Chioroplasts differ frcm higher plants in that they enccce only 30 tRNA 
5C ecies Since chlorcolasts have resected their tRNA genes, the use or preferred codons by chlorcplast- 
encoded proteins appears more extreme. However, a pos.tive correlation has been reporteo between the 
leve. of isoacceptmg tRNA for a given amino acid and me frequency with wh.cn this cooon is usea m the 
cr.ioroptast genome (Pfitzinger et al. (1987) Nucl. Acids Res. ^5:1 377-1 386). 

Our analysis of the plant~genes sample confirms earlier reports that -he nuclear and cnicrooiast 
cenomes in plants have Cistinct ceding strategies. The codon usage of mcnocots in Me sample is dst.net 
from -loroplast usage, sharing the most commonly used ccdon for only 1 of 13 am.no adds. Oicots :n this 
sarnie' share -he most commonly used codon of chicroplasts in only * cf t8 amino acids, in general, the 
chloroplast codon profiie more close.y resembles that of unicellular organisms, with a strong b.as towards 
me use of A + T in the degenerate third base. 

in unicellular organisms, highly expressed genes use a smaller subset of codons than do weakly 
..pressed genes although the codons preferred are distinct in some cases. Sharp and U (1988) Nucl. Act* 
Res 14-7734-7749 report that cccon usage in 165 E coii genes reveals a positive correlation between high 
expression and increased codon bias. Bennetzen and Hall (1982) supra have described a amilar trend ,n 
codon selection in yeast Codon usage in these h.ghly expressed genes correlates w.th ^"* a ™ °' 
sorting tRNAs in both yeast and 6 ccji. it has own proposed mat the good fit of ^aoundart yeast and 
E eeli mflNA cooon usage to isoacSptor tRNA abundance promotes high translation levels and high 
Heady state levels of these prote.ns. This strong.y suggests mat me potential of expression 

o oTnt aenes in yeast or E. coli is limited by their codon usage. Hoekema et al. (1987) supra report that 
JpSS me 2?mosTfa-red yeast codons with rare codon, in me S end of me highry expressed 
gene ^ ieads to a decrease in bom mRNA and protein. These result indicate mat codon t.as should 
be emphasized when engineering high expression of foreign genes in yeast and other systems. 



(iii) Seouences within me Stt coding region having potentially destabilizing influences 

Analysis of the Btt gene reveals mat the A * T content represents 64% of me ONA base composition 
o, ^S^^AtL level of A ♦ T is about .Or. higher man mat found in a 
°eZl Most often high A ♦ T regions are found in intergenic regrans. Also, many plant <W**** 
eauenc« tre Tservel to be AT-rich. These observations lead to the consideration mat an elevated A ♦ 
rSS^ .m. J. cc^egfcn ^"^Vc^^^ — 

loping ImZ tound in coding regions of plant nuclear genes. The synthetic Btt gene of «h,s mm** 
has an A ♦ T content of 55%. 

Table 3 



Adenine * Thymine Content in Btt Coding Region 





8ase 


%G+C 


%A + T 


Coding Region 


G 


A 


T 


C 


Natural BU gene 
Synthetic Btt gene 


341 
392 


633 
530 


514 
483 


306 
428 


36 

45 


64 
55 
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..-> 2-ntion. re natjral 9k gene <s scar.rec .'or secuences ir.n are potentially destaCiiicmg :c 3!t SNA. 
"f-esr secuences. v/hen ('certified in tne o-^tral Stt gene, are eliminated inrougn modification v .nucleotide 
secuerce*. Included in this. gn«jp or pctenna:ty destabilizing sequences are: 

{3» cis/u potycdenyiaticn signals (as described oy Josfti (1987) Ncci Ac;ds - = es. 15 9627-9640). In 
o e*jkar/o;es. :tie jnmary iranscncts of nuclear qenes are extensively s.-ccessed (steoT ir.ciucmg s' - 
cscong >ctrcr scnctng. coiyadenyiationt tc term .-nature and transtaiacte mP.NAs in hirjher plants, 
ocivadeny et:er. involves enconucleotytic cleavage at me pciyA site followed oy ;ne addition cf several A 
'esicues to me cleaved end. The selection of the polyA site is presumed to be cis-regulared. During 
excresstcr cf 3t proton and RNA in different plants, the present inventors have observed -iiat the 
n pciyadeny^atea t.RMA isolated trom these expression systems is net Mi-length but instead is truncated cr 
degraded. Hence, in the present mvertion it was decided to minimize possible destabilization cf RNA 
through elimination of potential poiyadenyiaoon signals within the coding region of the synthetic Stt gene. 
Plant poiyadenylation Signals including AATAAA. AATGAA, AATAAT. AATATT, GATAAA. GATAAA, and 
AATAAG motifs do not appear in the synthetic Btt gene when scanned for 0 mismatches of :he secuences. 
.'5 (t) DOiymerase II termination sequence. CAN7-9AGTNNAA. This sequence was shown (Vankan and 

Filipowicz (1988) EMBO J. 7:791-799) to oe next to the 3' end of the coding region of the U2 snfiNA genes 
of Arafcidoosis thaliana and Ts believed to be important for transaction termination uoon 3' enc processing. 
The synthetic Btt gene is devoid of this termination sequence. 

(c) CUUCGG hairpins, responsible for extraordinarily stable RNA secondary structures associated 
20 with vanous biochemical processes (Tuerk et al. (1988) Proc. Natl. Acad. Scr 85:1364-1368). The 
exceptional staoility of CUUCGG hairpins suggests that they have an unusual structure and may 'unction in 
organizing the proper folding of complex RNA structures. CUUCGG hairpin sequences are not found with 
either 0 or 1 mismatches in the Btt coding region. 

(O plant consensus solice sites, s' = AAG:GTAAGT and 3' = TTTT(Pu)TTT(Pu)T(Pu)T(Pu)T(Pu>- 
?s TGCAG:C. as desenbed by Brown et ai. (1986) EMBO J. 5:2749-2758. Consensus secuences for the 5* and 
3 solice junctions have been derived from 20 and 30 plant intron secuences. respectively. Although it is not 
likely mat such potential splice sequences are present in St genes, a search was initiated for sequences 
resembling plant consensus splice sites in the synthetic Btt gene. For the 5 splice site, the closest match 
was with three mismatches. This gave 12 sequences of which two had G:GT. Only position 948 was 
30 changed because 1323 has the Kon l site needed for reconstruction. The 3 -splice site is not found in the 
synmetic 8tt gene. 

Thus, by highlighting potential RNA-destatoilizing sequences, the synthetic Stt gene is designed to 
eliminate known eukaryotic regulatory sequences that effect RNA synthesis and processing. 

jS 

Example 2. Chemical synthesis of a mcciffed Btt structural gene 



0) Synthesis Strategy 

40 

The general plant for synthesizing linear double-stranded ON A sequences coding for the crystal protein 
from Stt is schematically simplified in Figure 2. The optimized ONA coding sequence (Figure 1) is divided 
into thirteen segments (segments A-M) to be synthesized individually, isolated and punfied. As snown in 
Figure 2. the general strategy begins by enzymancaily joining segments A and M to form segments AM to 

45 whicn is added segment 31 to term segment ABLM. Segment CK is then added enzymaticaily to make 
segment A8CKLM which is enlarged through addition of segments OJ. 9 and RFH sequentially to give 
finally the total segment A8C0EFGHIJKIM. representing the entire coding region of the Stt gene. 

Figure 3 outlines in more detail the strategy used in comcinmg individual ONA segments in order to 
effect the synthesis of a gene having unique restriction sites integrated into a defined nucleotide sequence. 

so Each cf trie thirteen segments (A to M) has unique restriction sites at bo 1 ends, allowing the segment to be 
strategically soliced into a growing ONA polymer. Also, unique sites are oiaced at each end of the gene to 
eracie easy transfer from one vector to another. 

Tne thirteen segments (A to M) used to construct the synthetic gene vary in size. Oligonucleotide pairs 
cf approximately 75 nucleotides eacn are used to construct larger segments having approximately 225 

55 nucleotide pairs. Figure 3 documents the number of base pairs containeo within each segment and 
specifies the unique restriction sites oordenng each segment. Also, the overall strategy to incorporate 
specific segments at apprcpnate splice sites is detailed in Figure 3. 
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( it Preoaration of sligodecxvrucleo tides 

3 reoaration of oiigodeoxynucleotiaes for use in the synthesis of a ONA sequence ccmo.-istrg a gene J or 
3tt is earned out according to the general procedures cescrbed by Matteuca et ai. (1981) J Am. Chem. 

i Sec. 1 03.3 135-3 192 and Beaucage et al. (198V< Tetranedron Lett 22:1353-1352. All oligonuctecttces are 
prepared by the sctic-pnase phospncVarnioite tnester coupling aooroach, using an Apotied Eiosystems 
Model 330A ONA synthesizer. Oeprotection and cleavage of the oligcrners from :he sotio suooort are 
carried out according to standard procedures. Crude oligonucleotide mixtures are punfied using an 
oligonucleotide purification cartridge (OTC. Applied Biosystems) as described by McBrice et al. OS88) 

to Sictechmoues 6:362-367. 

5 -phosphorylation of oligonucleotides is performed with T4 aolynuciectide kinase. Tlie reaction con- 
tains 2ug oligonucieotice and 18.2 units polynucleotide kinase (Pharmacia* in iinker kinase buffer (Mantatis 
(1982) Cloning Manual. Fritsch and Sambrook (eds.). Cold Spring Hartor Laboratory, Cold Spring Harbor. 
NY). The reaction is incubated at 37* C for t hour. 

is Oligonucleotides are annealed by first heating to 95 ' C for 5 min. and then allowing complementary 
pairs to cool slowly to room temperature. Annealed pairs are reheated to 65* C. solutions are combined, 
cooled slowly to room temperature and kept on ice until used. The iigated mixture may be purified by 
electrophoresis through a 4% NuSieve agarose (FMC) gel. The band corresponding to the iigated duplex is 
excised, the ONA is extracted from the agarose and ethanol precipitated. 

20 Ligations are earned out as exemplified by that used m M segment ligations. M segment ONA is 
brought to 65* C for 25 min, the desired vector is added and the reaction mixture is incubated at 65* C (or 
15 min. The reaction is stow cooled over M/2 hours to room temperature. ATP to O.SmM and 3.5 units of 
T4 ONA ligase salts are added and the reaction mixture is incubated for 2 hr at room temperature and then 
maintained ovemignt at 1 5* C. The next morning, vectors which had not been iigated to M block ONA were 

25 removed upon linearization by Eco fll digestion. Vectors iigated to the M segment ONA are used to 
transform £. coli MC1061. Colonies containing inserted blocks are identified by colony hybridization 
with 32 P-iabeiiecToligonucleotide probes. The sequence of the ONA segment is confirmed by isolating 
plasmid ONA and sequencing using the dtdeoxy method of Sanger et ai. (1977) Proc. Natl. Acad. Sci. 
74:5463-5467. 

20 

(iii) Synthesis of Segment AM 

Three oligonucleotide pairs (Ai and its complementary strand Ale. A2 and A2C and A3 and A3C) are 
35 assembled and Iigated as described above to make up segment A. The nucleotide sequence of segment A 
is as follows: 
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ir. Table 4. bold linos demarcate the individual oligonucleotides. Fragment Ai contains 71 bases. Ale has 
76 bases. A2 has 75 bases. A2C has 76 bases. A3 has 82 bases and A3C has 76 bases. In all. segment A 
is composed of 228 base pairs and is contained between Ecofll restriction enzyme site and one destroyed 
Ecofll site (5 )J. (Additional restriction sites within Segment A are indicated.) The EcoRI single-stranoed 
conesjve ends allow segment A to be annealed and then ligatec to the Ecofll-cut cicrong"vector. piC20K. 

Segment M comorises three oligonucleotide pairs:' Ml. 80 bases. "mT"c. 86 bases. M2, 87 bases. M2c. 
87 bases. M3. 35 bases and M3c 79 bases. The individual oligonucleotides are annealed and ligated 
according to standard procedures as described above. The overall nucleotide sequence of segment M is: 



55 



24 



EP 0 359 472 A2 



> t — 



c. ri t c. 



1 : l 



20 



25 



i o — 



1:3 
3:5 

m 

n ♦ a 




5i = 



a J 3""£ 

o < o — 

fit 



4 : 5 



5 : 2 
5:3 



3:1 

o t n 

:? 



30 



35 



9 • • 

» * n 

m 1 m 

m 1 — 



in Table 5 bold lines demarcate me individual oligonucleotides. Segment M contains 252 base pairs and 

has destroyed EcoRI. restriction sites at both ends. (Additional restriction sites within segment M are 
^ indicated). Segment M is inserted into vector plC20R at an Ecofll restriction site and cloned. ^ 

As proposed in Figure 3, segment M is joined to segment A in the plasmid in which it is contained. 

Segment M is excised at the flanking restrictions sites from its doning vector and spliced into plC20K. 

harboring segment A t through successive digestions with Hindlll followed by Bglll. The plC20K vector now 

comprises segment A joined to segment M with a Hindlll site at the splice site (see Figure 3). Plasmid 
+3 plC20K is derived from ptC20R by removing the Scal-Ndel ONA fragment and inserting a Hindi fragment 

containing an NPTl coding region. The resulting piasmid of 4.44 to confers resistance to kanamycin on E. 

cotf. 

50 Example 3. Expression of synthetic crystal protein gene in bacterial systems 

The synthetic Btt gene is designed so that it is expressed in the plC20R-kan vector in which it is 
constructed. This expression is prccuced utilizing the initiation methionine of the lacZ protein of plC20K. 
The wile-type Btt crystal protein sequence expressed in this manner has full insecticidai activity, in addition. 
55 the synthetic gene is designed to contain a BamHI site S proximal io the initiating methionine codon and a 
Sglll site 3' to the terminal TAG translation step codon. This facilitates the cloning of the insecticidai crystal 
Srotein coding region into bacterial expression vectors such as pOR540 (Russell and Bennett 1982). 
Plasmid pCR540 contains the TAC promoter which allows the production of proteins including Btt crystal 



25 



BP 0 359 472 A2 



r-.».n ..rear .:crtrcn*2 --nations m anounrs - J3 :c lO's -f -me :o:a. oactenai crarem ~-, s remoter 
-•jn::::- s =n many gram-regaove bacteria inc!uc:ng E. ccii ard Pseuccrr'cnas. 

ProcLC:ion c f St msecnodal crystal orcein from tre syrmetic ;ere .n cacrena aemonstr«»s tnat tie 
rrr^n crocucec has the expected toxicity to :cieopteran insects. These reccnomart bactenai strains ,n 
*n*-n seives nave potential value as m»crooiai insecticides proauct cf tne synrnenc gene. 

£v5rncte j_ Expression of a synthetic crystal prctem gere m clants 

The synthetic Btt crystal protein gene :s cesigr.ee to facilitate c:cr..ng :r .to :he expression cassettes 
These um.ze sites ccmoatioie with the BamHl and Bgill restnenon sites banking the synthetic gere 
Cassettes are availaoie that utilize plant oromoters -nciuding CaMV 3SS. CaMV 19$ and 'he ORF 24 
cromoter ram T-ONA. These cassenes prov.de the reccgn.tion signals essential for exoress-cn of orceins 
:n ciants. These cassenes are utilized m the micro Ti olasmids such as pH575. Plasmias such as pH575 
ccrta,nmg -he synthetic 8tt gene directed oy plant expression signais are utilized in disarmed Agroeac- 
■ erium :umefaciens t0 ir| troacce the synthetic gene into plant genomic DNA. This system has been 
cescnoed previously by Adang et al. (1987) to express 8t var. kurstaki crystal protein gene in tobacco 
cfants. .nese tobacco olants were toxic to feecing tobacco hornwormT 



Example 5^ Assay for msecticidai activity 

Sioassays were conducted essentiaily as Described by Sekar. V. et aJ. suora. Toxicity was assessed by 
an estimate of the L0 M . Piasmids were grown in 6. coli JM105 (YanlsdvPeTron. C. et al. (1985) Gene 
33:103-! 19). On a molar basis, no significant differences in toxicity were observed between"rystaJ proteins 
encoded by P 544Pst-Met5. p544-Wincill, and pNSBP544. When expressed in plants under iaenticai 
conditions, cells containing protein encoded by the synthetic gene were observed to be more toxic than 
those rontaining protein encoded by the native Btt gene. Immunoblots {"western* blots) of cell cultures 
indicated that those that were more toxic had more crystal protein antigen. Improved expression of the 
synthetic Btt gene relative to that of a natural Btt gene was seen as the ability to quantitate soecific mflNA 
rransenpts *rom expression of synthetic 9tt genes on Northern blot assays. 



Claims 



1 A synthetic gene designed to be highly expressed in plants comprising a DNA sequence encoding 
an msecticidai protein which is functionally equivalent to a native insecticidaJ protein of 3t. 

2. A synthetic gene of claim 1 wherein said ONA sequence is at least about 85% homologous to a 
native msecticidai protein gene of Btt. 

3. A synthetic gene of claim "wherein said DNA sequence is that presented in Figure t. spanning 
nucleotides l through 1793. 

4. A synthetic gene of claim t wherein said DNA secuence is that oresented in Fgure t spanning 
nucleotides I through 1833. 

5 A synthetic gene of claim 1 wherein the overall frequency of preferred codon usage within the entire 
ceding region of said synthetic gene is within about 75% of the frequency of codon usage oreferred in 
plants. 

6 A synthetic gene of claim 1 wherein the A T base content of saio DNA sequence is substantially 
equal to :he A * T base content found in plant structural genes. 

7 A synthetic gene of claim i wherem a plant initiation sequence is oresent ar the 5' end of the coding 
region. 

8. A synthetic gere of claim 1 therein plant polyacenyla-tion signals, comprising mose having 
AATAAA. AATGAA. AATAAT, AATATT. GATAAA. GATAAA and AATAAG motifs, are eliminated in said ONA 

secuence. 

9 A synthetic gene of claim i wherein the polymerase II termination sequence. CAN7-3AGTNNAA, is 
eliminated m saic DNA secuence. 

10 A synthetic gene of claim 1 wherein CUUCGG hairpins are eliminated in said ONA sequence. 
HA syrtneuc gene cf claim t wnerem piant consensus splice sites, mciudirg 5' = AAG:G7AAGT and 

3 = TTTTtPu)TTT(Pu)T(Pu)T(Pj)T(Pu)TGCAG;C. are eliminated in said ONA sequence. 
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12. A synthetic gere of ::aim i wnerein the CG and TA doublet avoidance indices are substantially 
equal to rhat of mgniy exoresseo geres m the selected host plant 

13. A recombinant ONA clcr.ing vector comprising said synthetic gene of c!aim 1. 

1 4. A olant cell wntch contains tne synthetic gene of claim i . 

5 15. An improved method of producing a protein toxic to an insect composing ;he step of intrccucing 

into a host plant cell a ONA segment compnsmg a synthetic gene designed to be highly expressed m 
plants comprising a ONA sequence encoding an insecticidal protein which is functionally equivalent to a 
native insecticidal protein of 8t such that said synthetic gene is expressed m said plant host. 

10 
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A • . T * * 4 A A A A , • A AAA 

*rccr::cA(uciiCJucACCc*cccccT;rji:iccic:*ccjtccA>jiCA:cTCAtrcACwccccjL ::*:ccr "c:ccc:ctr::c::Tcr.ccrrc:rc 

• 100 

: ac ; r a r * at c a a 

c:r'c::cr::cc:cc:cc;:r:cTr:c:TTcrACAc:A>cT*TCTcjuTACTATTT:ccccACC^racccTTccA>cccuTTA;caic:A>c:cCA 

* 200 

FPFGGAlVSFTTilFliiTIWPSEO'WCifNEQvG 



4 A A AT ATA AIT T 

»Gc:n<ju:;jacACA*c*Tc;cT<>rT*TccAAACA^ 

201 * 500 

ALMSOCtAOrAtWCALAClOGLOHHVeOTVSAt. 

A ACT CAT T T T 

AGTTCATCCCAAAAGAATCCTGTCTCCTttCQUUT CCACA T AGCU 

301 - * - * • * 400 

SSVlQCUPVSSCtfPrfSOGtt ft S L f SQACSNftHSH 

G AT A AAATA TAT T 
TCCCTTCCTT TCCCA T CT C T CCGT AC CACCT TCTCTTTCTr ACAACC T ACCCT CAAGCT GCCAACACACATCT CTTCT fACTAAJUCACGCTCAAA T CTA 
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p s « a i sgtevlfl:ttacaa«? ulfiicoaoi r 

A A TATTUA G A T 
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501 * * - - 600 

CCCUGTECeOlAe'TCXQUClTQETTOllCVCVT 

AAA TT A AT ATA 

<>TGT?CC^rTCCATA>CTTCAGAGCTTCATCnATCAATCTTCCCTAAA 

601 * - * - - * * 700 

HVGLOcitGSSresuvtffitit tENfirviDi [ A 
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TCTCTACTCACCTCTTACCAAACTCCMITCWW^ 
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TG AA AA T TA 
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Figure 2. Simplified Schema for Synthesis ot = " G 
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Coding region of synthetic Btt cer.e 
divided into 13 segments 
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